Chromatin Immunoprecipitation Sequencing    ◾    241

Figure 6.16 shows the dot plot of the first ChIP-Seq sample.

The IDs, descriptions, and statistics of the significant GO terms are stored in “chip1_

GO.csv”, “chip2_GO.csv”, and “chip3_GO.csv”. In Figure 6.16, we can notice that those top

ten GO terms are associated with gene transcription which reflects the Poly II biological

activity. The definitions of the GO terms can be searched at “http://www.informatics.jax.

org/vocab/gene_ontology/”. Thus, ChIP-Seq provides information about the functions of

the protein studied.

We can also use KEGG database for gene pathways to annotate the genes with signifi-

cant peaks. The “enrichKEGG()” function returns the enrichment KEGG categories with

FDR control. The following codes generate KEGG signaling pathway annotation and cre-

ate dot plot for each sample (Figure 6.17):

ekegg1 <- enrichKEGG(gene = entrez1, organism = ‘hsa’,

pvalueCutoff = 0.05)

cluster_kegg1 <- data.frame(ekegg1)

write.csv(cluster_kegg1, “kegg_chip1.csv”)

dotplot(ekegg1)

#Chip2

ekegg2 <- enrichKEGG(gene = entrez2, organism = ‘hsa’,

pvalueCutoff = 0.05)

cluster_kegg2 <- data.frame(ekegg2)

write.csv(cluster_kegg2, “kegg_chip2.csv”)

dotplot(ekegg2)

#Chip3

ekegg3 <- enrichKEGG(gene = entrez3, organism = ‘hsa’,

pvalueCutoff = 0.05)

cluster_kegg3 <- data.frame(ekegg3)

write.csv(cluster_kegg3, “kegg_chip3.csv”)

dotplot(ekegg3)

The significant KEGG signaling pathways show the most likely active pathways in the cells.

We can also compare enrichment across samples by using “compareCluste()” function,

which requires the list of genes from each sample (Figure 6.18).

# Create a list with genes from each sample

genes = lapply(annotated_peaks, function(i) as.data.

frame(i)$geneId)

# Run KEGG analysis

compKEGG <- compareCluster(geneCluster = genes,

fun = “enrichKEGG”,

organism = “human”,

pvalueCutoff = 0.05,

pAdjustMethod = “BH”)

dotplot(compKEGG, showCategory = 10, title = “KEGG Pathway

Enrichment Analysis”)